Project 2 (EAS 509: Fall’23)
Project Title: Time Series Analysis &
Forecasting of Oil Sales
Team Members:
- Sujay Shrivastava (50496221) (sujayshr)
- Utkarsh Mathur (50495131) (umathur)
- Venkata Lakshmi Krishna Tejaswi Gudimetla (50496378) (vgudimet)
Importing Libraries and Data
# Visualization and Analysis
library(dplyr)
library(tidyr)
library(ggplot2)
library(ggdendro)
library(readxl)
library(plotly)
suppressWarnings({
library(readxl)
})
# Modeling and Inference
library(TSA)
library(forecast)
library(astsa)
# Importind Data.
# NOTE: Please store the oil.csv file in the directory of this file.
df = read.csv('oil.csv')
summary(df)
date dcoilwtico
Length:1218 Min. : 26.19
Class :character 1st Qu.: 46.41
Mode :character Median : 53.19
Mean : 67.71
3rd Qu.: 95.66
Max. :110.62
NA's :43
From the above summary we can observe that there are 1218 datapoints
in out dataset and 43 missing values. We will address the missing values
later on.
Plotting Original Data
plot1 <- df |>
plot_ly(type="scatter", mode="lines") |>
add_trace(x = ~date, y = ~dcoilwtico, name="Daily Sale") |>
layout(showlegend=FALSE, plot_bgcolor = "white")
options(warn=-1)
plot1
Warning: Can't display both discrete & non-discrete data on same axisWarning: Can't display both discrete & non-discrete data on same axis
Imputing Missing Value
Based on the analysis of the plot we have decided to impute missing
value with the next value observed in the time series.
df_new <- df %>% fill(dcoilwtico, .direction="up")
summary(df_new)
date dcoilwtico
Length:1218 Min. : 26.19
Class :character 1st Qu.: 46.42
Mode :character Median : 53.19
Mean : 67.67
3rd Qu.: 95.59
Max. :110.62
We now plot the imputed dataset.
plot2 <- df_new |>
plot_ly(type="scatter", mode="lines") |>
add_trace(x = ~date, y = ~dcoilwtico, name="Daily Sale") |>
layout(showlegend=FALSE, plot_bgcolor = "white")
options(warn=-1)
plot2
Warning: Can't display both discrete & non-discrete data on same axisWarning: Can't display both discrete & non-discrete data on same axis
Upon plotting the imputed dataset we can observe that there is no
trend or seasonality observed over time. This is a very good instance to
test Exponential Smoothing methods.
components_df <- decompose(df_new)
Error in decompose(df_new) : time series has no or less than 2 periods
As it can be seen in the above code piece the data has no seasonality
as the time series in not decomposible.
ETS and Holt-Winters Models
Forecasts produced using exponential smoothing methods are weighted
averages of past observations, with the weights decaying exponentially
as the observations get older. In other words, the more recent the
observation the higher the associated weight.
Simple Exponential Smoothing
The simplest of exponentially smoothing methods is Simple Exponential
Smoothing (SES). This method is suitable for forecasting data with no
clear trend or seasonal pattern. For simple exponential smoothing, the
only component included is the level as observed in the below
forecasting and smoothing equation. The smoothing equation only uses
alpha to include the trend component of Time Series.

Holt-Winters Methods
Holt (1957) and Winters (1960) extended Holt's method to capture
seasonality. The Holt-Winters seasonal method comprises the forecast
equation and three smoothing equations — one for the level \(l_t\) one for the trend \(b_t\), and one for the seasonal component
\(s_t\) ,with corresponding smoothing
parameters \(\alpha\), \(\beta^*\), and \(\gamma\) respectively.
Additive Method:

Multiplicative Method:

Model Training
As the data doesn’t have seasonality in it (evident from the
decomposition of data) we can safely say that Holt-Winters method is of
lesser use here. So we are going to compare 2 models:
- ARIMA (trained without specifying p, d, and q)
- Exponential Smoothing
ARIMA Model
modelARIMA <- auto.arima(df_new['dcoilwtico'])
summary(modelARIMA)
Series: df_new["dcoilwtico"]
ARIMA(0,1,0)
sigma^2 = 1.449: log likelihood = -1952.37
AIC=3906.73 AICc=3906.73 BIC=3911.83
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set -0.03759184 1.203095 0.8949041 -0.08030733 1.547907 0.9992644 -0.04614975
As observed, the best fit ARIMA model is ARIMA(p=0, d=1, q=0) with
AICc3906.73 and RMSE 1.203095.
fcARIMA <- forecast(modelARIMA, h=100)
autoplot(fcARIMA)

Exponential Smoothing model
dfETS <- as.ts(df_new['dcoilwtico'])
modelETS <- ets(dfETS)
summary(modelETS)
ETS(A,N,N)
Call:
ets(y = dfETS)
Smoothing parameters:
alpha = 0.954
Initial states:
l = 93.1356
sigma: 1.2028
AIC AICc BIC
9107.716 9107.735 9123.031
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set -0.03953028 1.20184 0.8965985 -0.08424058 1.551493 1.001156 -0.0008160611
The best fitted Exponential Smoothing model has alpha 0.954, and its
has AICc of 9107.735 and RMSE score is 1.20184 which is better than the
performance of best fitted ARIMA model.
fcETS <- forecast(modelETS, h=100)
autoplot(fcETS)

Diagnosis and Forecasting Analysis
Upon training the dataset on ARIMA and ETS models we observe that
they yield similar RMSE scores. This metric is further supported in the
behavior of the 100 days forecast plots.
However, the RMSE score of the Exponential Smoothing (1.20184) is
better than ARIMA model (1.203095) so, for our data the the Exponential
Smoothing model (with alpha=0.954, beta=0, and gamma=0) is a better
model than ARIMA (p=0, d=1, q=0).
---
title: "R Notebook"
output: html_notebook
---

# Project 2 (EAS 509: Fall'23)

### [**Project Title**]{.underline}: Time Series Analysis & Forecasting of Oil Sales

### [Team Members]{.underline}:

1.  Sujay Shrivastava (50496221) (sujayshr)
2.  Utkarsh Mathur (50495131) (umathur)
3.  Venkata Lakshmi Krishna Tejaswi Gudimetla (50496378) (vgudimet)

------------------------------------------------------------------------

## Importing Libraries and Data

```{r}
# Visualization and Analysis
library(dplyr)
library(tidyr)
library(ggplot2)
library(ggdendro)
library(readxl)
library(plotly)
suppressWarnings({
  library(readxl)
})

# Modeling and Inference
library(TSA)
library(forecast)
library(astsa)
```

```{r}
# Importind Data.
# NOTE: Please store the oil.csv file in the directory of this file.
df = read.csv('oil.csv')
summary(df)
```

From the above summary we can observe that there are 1218 datapoints in out dataset and 43 missing values. We will address the missing values later on.

------------------------------------------------------------------------

## Plotting Original Data

```{r}
plot1 <- df |>
  plot_ly(type="scatter", mode="lines") |>
  add_trace(x = ~date, y = ~dcoilwtico, name="Daily Sale") |>
  layout(showlegend=FALSE, plot_bgcolor = "white")
options(warn=-1)

plot1
```

------------------------------------------------------------------------

## Imputing Missing Value

Based on the analysis of the plot we have decided to impute missing value with the next value observed in the time series.

```{r}
df_new <- df %>% fill(dcoilwtico, .direction="up")
summary(df_new)
```

We now plot the imputed dataset.

```{r}
plot2 <- df_new |>
  plot_ly(type="scatter", mode="lines") |>
  add_trace(x = ~date, y = ~dcoilwtico, name="Daily Sale") |>
  layout(showlegend=FALSE, plot_bgcolor = "white")
options(warn=-1)

plot2
```

Upon plotting the imputed dataset we can observe that there is no trend or seasonality observed over time. This is a very good instance to test Exponential Smoothing methods.

```{r}
components_df <- decompose(df_new)
```

As it can be seen in the above code piece the data has no seasonality as the time series in not decomposible.

------------------------------------------------------------------------

## ETS and Holt-Winters Models

Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older. In other words, the more recent the observation the higher the associated weight.

### Simple Exponential Smoothing

The simplest of exponentially smoothing methods is Simple Exponential Smoothing (SES). This method is suitable for forecasting data with no clear trend or seasonal pattern. For simple exponential smoothing, the only component included is the level as observed in the below forecasting and smoothing equation. The smoothing equation only uses alpha to include the trend component of Time Series.

![](images/Screenshot%202023-12-06%20220120.png)

### Holt-Winters Methods

Holt (1957) and Winters (1960) extended Holt\'s method to capture seasonality. The Holt-Winters seasonal method comprises the forecast equation and three smoothing equations --- one for the level $l_t$ one for the trend $b_t$, and one for the seasonal component $s_t$ ,with corresponding smoothing parameters $\alpha$, $\beta^*$, and $\gamma$ respectively.

#### Additive Method:

![](images/Screenshot%202023-12-06%20221048.png)

#### Multiplicative Method:

![](images/Screenshot%202023-12-06%20221107.png)

------------------------------------------------------------------------

## Model Training

As the data doesn't have seasonality in it (evident from the decomposition of data) we can safely say that Holt-Winters method is of lesser use here. So we are going to compare 2 models:

1.  ARIMA (trained without specifying p, d, and q)
2.  Exponential Smoothing

### ARIMA Model

```{r}
modelARIMA <- auto.arima(df_new['dcoilwtico'])
summary(modelARIMA)
```

As observed, the best fit ARIMA model is ARIMA(p=0, d=1, q=0) with AICc3906.73 and RMSE 1.203095.

```{r}
fcARIMA <- forecast(modelARIMA, h=100)
autoplot(fcARIMA)
```

### Exponential Smoothing model

```{r}
dfETS <- as.ts(df_new['dcoilwtico'])
modelETS <- ets(dfETS) 
summary(modelETS)
```

The best fitted Exponential Smoothing model has alpha 0.954, and its has AICc of 9107.735 and RMSE score is 1.20184 which is better than the performance of best fitted ARIMA model.

```{r}
fcETS <- forecast(modelETS, h=100)
autoplot(fcETS)
```

### Diagnosis and Forecasting Analysis

Upon training the dataset on ARIMA and ETS models we observe that they yield similar RMSE scores. This metric is further supported in the behavior of the 100 days forecast plots.

However, the RMSE score of the Exponential Smoothing (1.20184) is better than ARIMA model (1.203095) so, for our data the the Exponential Smoothing model (with alpha=0.954, beta=0, and gamma=0) is a better model than ARIMA (p=0, d=1, q=0).

------------------------------------------------------------------------

## References

1.  <https://www.rdocumentation.org/>
2.  Hyndman, R.J., & Athanasopoulos, G. (2021) *Forecasting: principles and practice*, 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3. Accessed on 4 December 2023. (<https://otexts.com/fpp3/>) \*\*
3.   <https://robjhyndman.com/expsmooth/> (Accessed on 4 December 2023)
4.  <https://medium.com/analytics-vidhya/a-thorough-introduction-to-holt-winters-forecasting-c21810b8c0e6> (Accessed on 4 December 2023)
5.  <https://www.machinelearningplus.com/time-series/arima-model-time-series-forecasting-python/> (Accessed on 5 December 2023)
6.  <https://www.r-bloggers.com/2023/01/imputation-in-r-top-3-ways-for-imputing-missing-data/> (Accessed on 5 December 2023)
7.  <https://tidyr.tidyverse.org/reference/fill.html> (Accessed on 6 December 2023)

\*\* We used the exact text from this book to maintain the accuracy of theoretical information provided.
